59 research outputs found
Motif Discovery through Predictive Modeling of Gene Regulation
We present MEDUSA, an integrative method for learning motif models of
transcription factor binding sites by incorporating promoter sequence and gene
expression data. We use a modern large-margin machine learning approach, based
on boosting, to enable feature selection from the high-dimensional search space
of candidate binding sequences while avoiding overfitting. At each iteration of
the algorithm, MEDUSA builds a motif model whose presence in the promoter
region of a gene, coupled with activity of a regulator in an experiment, is
predictive of differential expression. In this way, we learn motifs that are
functional and predictive of regulatory response rather than motifs that are
simply overrepresented in promoter sequences. Moreover, MEDUSA produces a model
of the transcriptional control logic that can predict the expression of any
gene in the organism, given the sequence of the promoter region of the target
gene and the expression state of a set of known or putative transcription
factors and signaling molecules. Each motif model is either a -length
sequence, a dimer, or a PSSM that is built by agglomerative probabilistic
clustering of sequences with similar boosting loss. By applying MEDUSA to a set
of environmental stress response expression data in yeast, we learn motifs
whose ability to predict differential expression of target genes outperforms
motifs from the TRANSFAC dataset and from a previously published candidate set
of PSSMs. We also show that MEDUSA retrieves many experimentally confirmed
binding sites associated with environmental stress response from the
literature.Comment: RECOMB 200
Evolutionary Algorithm Based on New Crossover for the Biclustering of Gene Expression Data
Microarray represents a recent multidisciplinary technology. It measures the expression levels of several genes under different biological conditions, which allows to generate multiple data. These data can be analyzed through biclustering method to determinate groups of genes presenting a similar behavior under specific groups of conditions.
This paper proposes a new evolutionary algorithm based on a new crossover method, dedicated to the biclustering of gene expression data. This proposed crossover method ensures the creation of new biclusters with better quality. To evaluate its performance, an experimental study was done on real microarray datasets. These experimentations show that our algorithm extracts high quality biclusters with highly correlated genes that are particularly involved in specific ontology structure
- …